Data Science and Advanced Analytics

Melanie Murphy, Brian Calhoon, Dan Killian

What is Data Science? Advanced analytics?

  • Data science: The extraction of learning from data

  • Advanced analytics: The application of methods that learn from the data

DALL-E’s view of the MSI data science team

“a data science team solving complex analytical problems while at the climbing gym, in the style of anime”

Data science at
the World Cup

Data science
playing basketball

Data science at
the climbing gym

Robust external demand for advanced analytics

  • USAID’s 2011 evaluation policy remains in force, offering the opportunity for impact evaluations

  • USAID references analytics in its solicitations (simulation, social network analysis, modeling)

  • New data sources require deeper understanding, processing, and analysis

Robust internal demand for advanced analytics

  • We lost our analysts!

  • MSI junior / midlevel staff interested in gaining skills and experience / dedicated experience path

  • MSI needs to document and build its past performance and capabilities for new business

  • Need to justify / defend analytical choices

Data science and advanced analytics

Three primary areas of action

  • Best practice application of technique to extract learning
  • Build capacity of interested staff
  • Document past practice to help win new business

Application of technique to extract learning

Technique Use
Multiple regression Explain an outcome of interest, after accounting for the influence of other factors
Ensemble regression Predict an outcome of interest, using a collection of regression models
Factor / Principal Component Analysis Reduce a set of correlated variables into a fewer number of 'factors' or 'components'
Item response theory Validate the hypothesized measurement construct of an item
Conditional inference tree Search for statistically significant splits in the data across different pathways
Random forest Identify the most salient predictors of an outcome of interest using a collection of conditional inference trees
Causal tree Identify variation in a treatment effect in the form of a decision tree
Causal forest Identify the most salient variation in treatment effects using a collection of causal trees
Bayesian network Estimate probabilities between a set of variables
Bayesian priors Use stakeholder knowledge to co-create baseline values of an outcome of interest

Build capacity of interested staff

  • Structured trainings on-demand, hopefully at least once per year
  • 4-12 brown bag sessions per year
  • Weekly office hours / on-demand consultation

Document past practice and identify new opportunities

  • Inventory of techniques, activities, and analysts
  • Review existing task pipeline for opportunities to apply technique
  • Demonstrate new techniques to expand performance and capabilities

Tasks

  • Build and maintain trackers of techniques, where they were used, which staff implemented them
  • Identify interested and capable staff
  • Develop comms on current performance and capabilities
  • Develop how-to manual / cookbook on analytical workflow
  • Develop content for training and brown bag sessions

Where are we headed?

The increasing integration of human and machine intelligence to (hopefully) solve problems

Working with AI to set up your analyses

Hey GPT,
scrape this data

Hey GPT, put the data
into an interactive dashboard

Thank you!